Tesselation and Clustering by Mixture Models and Their Parallel Implementations
نویسندگان
چکیده
Clustering and tessellations are basic tools in data mining. The k-means and EM algorithms are two of the most important algorithms in the Mixture Model-based clustering and tessellations. In this paper, we introduce a new clustering strategy which shares common features with both the EM and k-means algorithms. Our methods also lead to more general tessellations of a spatial region with respect to a continuous and possibly anisotropic density distribution. Moreover, we propose some probabilistic methods for the construction of these clusterings and tessellations corresponding to a continuous density distribution. Some numerical examples are presented to demonstrate the effectiveness of our new approach. In addition, we also discuss the parallel implementation and performance of our algorithms on some distributed memory systems.
منابع مشابه
Tessellation and Clustering by Mixture Models and Their Parallel Implementations∗
Clustering and tessellations are basic tools in data mining. The k-means and EM algorithms are two of the most important algorithms in the Mixture Model-based clustering and tessellations. In this paper, we introduce a new clustering strategy which shares common features with both the EM and k-means algorithms. Our methods also lead to more general tessellations of a spatial region with respect...
متن کاملSerial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models
Model-based clustering using a family of Gaussian mixture models, with parsimonious factor analysis-like covariance structure, is described and an efficient algorithm for its implementation is presented. This algorithm uses the alternating expectationconditional maximization (AECM) variant of the expectation-maximization (EM) algorithm. Two central issues around the implementation of this famil...
متن کاملAn Overview of the New Feature Selection Methods in Finite Mixture of Regression Models
Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...
متن کاملEfficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields
This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...
متن کاملScalable Data Clustering using GPU Clusters
The computational demands of multivariate clustering grow rapidly, and therefore processing large data sets, like those found in flow cytometry data, is very time consuming on a single CPU. Fortunately these techniques lend themselves naturally to large scale parallel processing. To address the computational demands, graphics processing units, specifically NVIDIA’s CUDA framework and Tesla arch...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004